Skip to content

Conversation

@liangzhenduo
Copy link

i表示第几个task,需要乘上每个task的expert数量,而不是乘task数量

i是第几个task,需要乘上每个task的expert数量而不是task数量
@liangzhenduo liangzhenduo changed the title Update net.py fix a network bug Update net.py fix PLE network bug Jun 8, 2023
@CLAassistant
Copy link

CLAassistant commented Jun 8, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

fixed code style
在不同节点上的文件顺序可能不一致,split_file_list可能读到相同的文件。sort以后保证每个节点的文件列表顺序一致,拆分读取个节点不会读到重复文件。
@liangzhenduo liangzhenduo changed the title Update net.py fix PLE network bug fix PLE network bug & sort file list for ps trainer Apr 15, 2024
@liangzhenduo
Copy link
Author

更改reader是因为分布式训练时不同节点上的文件顺序可能不一致(都是无序状态),split_file_list后不同节点可能读到相同的文件。sort以后保证每个节点的文件列表顺序一致,拆分读取各节点不会读到重复文件。

@dachr8
Copy link

dachr8 commented Sep 4, 2024

pr 是正确的,同时 task_init 和 exp_init 的部分也有问题

for i in range(0, self.task_num):
for j in range(0, self.exp_per_task):
linear_out = self._param_expert[i * self.task_num + j](
linear_out = self._param_expert[i * self.exp_per_task + j](
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

正确的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants